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Abstract 


Latent change score models (LCSMs) proposed by McArdle (McArdle, 2000, 2009; McArdle & 
Nesselroade, 1994) offer a powerful tool for longitudinal data analysis. They are becoming 
increasingly popular in social and behavioral research (e.g., Gerstorf et al., 2007; Ghisletta & 
Lindenberger, 2005; King et al., 2006; Raz et al., 2008). Although conducting both univariate and 
multivariate latent change score analysis is not a difficult task any more (e.g., Ghisletta & 
McArdle, 2012; Zhang et al., 2015), there is little discussion on the design issues such as sample 
size planning for LCSMs. To fill the gap, this study proposes a Monte Carlo based method to 
determine the required sample size and the number of measurement occasions for both univariate 
and bivariate LCSMs. The method can obtain the power for testing each individual parameter of 
the models such as the change rate and coupling parameters. The Monte Carlo procedure is 
implemented and provided in a free R package RAMpath (Zhang et al., 2015). Examples for 
sample size and measurement occasion planning for both univariate and bivariate LCSMs are 


provided. 
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Sample Size and Measurement Occasion Planning for Latent Change Score Models through 


Monte Carlo Simulation 
Introduction 


Longitudinal data collection and data analysis are becoming a norm for psychological 
research (e.g., Grimm et al., 2016; McArdle & Nesselroade, 2014). A longitudinal design often 
involves data collection on multiple variables from multiple participants at multiple times. 
Despite of the increased cost and complexity, there are many advantages to collect longitudinal 
data. For example, a longitudinal design naturally enables a researcher to study change and 
related phenomena. In addition, inter-individual differences in change can also be investigated. 

Growth curve models are probably the most widely used technique for analyzing 
longitudinal data benefiting from the fact that a growth curve model can be fitted from a structural 
equation modeling (SEM) framework (e.g., McArdle, 1986; McArdle & Epstein, 1987; McArdle 
& Anderson, 1990; McArdle & Hamagami, 1992; McArdle, 1998; McArdle & Bell, 1998; 
McArdle & Nesselroade, 2014). With the increasing use of longitudinal design, it is not 
surprising that more and more complex models and methods have been developed. For example, 
in order to deal with missing data, full information maximum likelihood methods, multiple 
imputation, and Bayesian methods have been developed and used (e.g., Enders, 2011; Lu et al., 
2013). To deal with non-normal data, robust methods have been proposed (e.g., Yuan & Zhang, 
2012; Zhang, 2013; Zhang et al., 2013). 

A difficult issue in longitudinal research is to model the nonlinear trajectory of data. With 
more data collection, a linear growth curve model is often not sufficient. When moving to 
nonlinear models, issues such as computational difficulty can arise (e.g., Grimm et al., 2011; 
Wang & McArdle, 2008). Linearizing a nonlinear model provides an efficient way to deal with 
such difficulties. Although the method based on Taylor expansion is well known (e.g., Browne, 
1993; Neale & McArdle, 2000), it is less known that the latent change score models (LCSMs) 
provide a potentially more efficient way to model nonlinear trajectories. 


Proposed by McArdle and colleagues, LCSMs combine difference equations with growth 
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curves to study change in longitudinal studies (e.g., McArdle, 2000; McArdle & Hamagami, 
2001; Hamagami & McArdle, 2007a; Hamagami et al., 2010). In such models, change is directly 
modeled, which is often the focus of a longitudinal study. As we will show shortly, the models 
allow to easily accommodate certain nonlinear growth trajectories. In addition to the univariate 
LCSMs, bivariate LCSMs have also been proposed to model the inter-relationship between two 


growth processes (e.g., McArdle & Hamagami, 2001). 


Fitting a LCSM in the SEM framework is easy to understand but can be tedious. It can be 
done in almost any SEM software. Recently, Ghisletta & McArdle (2012) showed how to 
estimate a univariate LCSM using different R packages, including Lavaan (Rosseel, 2012), 
OpenMx (Boker et al., 2011) and sem (Fox, 2006). More recently, Zhang et al. (2015) automated 
the estimating procedure for the typical univariate and multivariate LCSMs through an R package 
RAMpath that is developed based on RAM notations (Boker et al., 2002; McArdle & Boker, 


1990). 


The importance of conducting statistical power analysis at the beginning of a study is 
universally accepted (e.g., Cohen, 1988; Hedges & Rhoads, 2010). Without adequate statistical 
power, the validity of statistical conclusions from all kinds of research is endangered (e.g., Cohen, 
1988; Hedges & Rhoads, 2010; Myors & Wolach, 2014; Shadish et al., 2002). For example, 
without a carefully planned sample size, a study can easily fail to detect an existing effect by 
chance, which in turn creates problems for replication or cross-validation. Although there are 
studies on sample size planning and power calculation for growth curve analysis (e.g., Zhang & 


Wang, 2009), we are not aware of any discussion on such design issues for LCSMs. 


To fill the gap, this study proposes a Monte Carlo based method to determine the required 
sample size and/or the number of measurement occasions for both univariate and bivariate 
LCSMs. The method can obtain the power for testing each individual parameter of the models 
such as the change rate and coupling parameters. We also implement the Monte Carlo procedure 


in a free R package RAMpath (Zhang et al., 2015). 


In the rest of the chapter, we first present the univariate and bivariate LCSMs. Then, we 
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introduce the Monte Carlo based method for power analysis. After that, we show how to conduct 
power analysis for LCSMs through several examples using our developed software. We conclude 


the chapter with discussion and future directions. 


A Univariate Latent Change Score Model 


Let Y |t], denote the data from the nth (n = 1,..., N) participant at time t (( = 1,...,7) 
of a sample consisting of NV participants measured for 7’ times. The first part of a LCSM is a 
measurement error model where an observed score Y |t],, is the sum of the latent true score y[t],, 


and the measurement error/uniqueness score ey|t},,: 


Y [En = yltln + eyltln. 


It is generally assumed that the error follows a normal distribution with mean 0 and variance 
varey. The second part of the model builds the relationship between consecutive latent true 
scores so that the current score at time ¢ is equal to the sum of the true score at the previous time 


t — 1 and the change score, dy|t],,, from time t — 1 to time t: 


yltln = ylé — Un + dyle)n. 
This effectively defines the change score as 
dy[t}n = yltln — y[t — In. 


Note that in the classic LCSM, the relationship between consecutive latent true scores is 
deterministic although it is not required to be so. The third part of the model concerns the 
modeling of the difference scores. One way is to model the difference score at time ¢ as the sum 


of a linear constant effect ys and the proportional change from time t — 1 such that 


dyl[t\n = Y8n + By x ylt _ lies 


where 3, is a compound rate of change. 
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Given the three part of the model, we can model the observed score as 


Y [tn = yltln + ey[t]n 


= (1+ By)ylt — In + sn + eylt]n. 


Successively expressing the above equation will lead to 


Y [tn = (1+ By)ult — Un + ysn + eyltln 
= (1+ By)(ylt — 2Jn + dy[t — Yn) + ySn + eylt|n 
= (1+ By)’y[t — Jn + (1+ By)ysn + Yn + eyltln 
= (1+ By) *y[1]n + [1 + (1 + By) +... + (1 + By) "lyn + eyltln 


= (1+ By) 1yOn + [1+ (1+ By) +... + (1 + By)? ]ysn + eyltln 


where y0,, is the initial latent score and note that the latent score at time t follows 


y[tln = (1 + By) yO, + [1+ (1+ By) +... + (1+ By)? 7 Jy sn. 


Clearly, the observed and latent scores behave as a nonlinear function of time and therefore can 
capture the nonlinear trajectory except when (, = 0. To visually show this, we plot the latent 
scores with different values for (,, in Figure 1. The basic LCSM can only handle this specific type 
of nonlinearity with exponential changes. For other types of nonlinearity, more complex LCSMs 
are needed. 

The initial latent score and the linear constant change score can be correlated. In the model, 


they are assumed to have a bivariate normal distribution 


On my0 varyO  varyOys 
~ MN ; 


YSn mys varyOys  varys 


with / N denoting a multivariate, here bivariate, normal distribution. Therefore, the initial latent 


score follows a normal distribution with mean my0 and variance vary0 and the constant change 
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score also follows a normal distribution with mean mys and variance varys. The covariance 


between them is varyOys with the correlation expressed as 


varyOys 


PyOys 


= Jvary0 X varys: 

Using a path diagram, this model is portrayed in Figure 2. In the path diagram, squares 
represent observed variables, while circles represent latent variables. A single-headed arrow is for 
deterministic parameters such as regression coefficients, or factor loadings, while a double-headed 
arrow represents stochastic parameters such as variance and covariance. A triangle represents a 
constant. Any arrow originating from the triangle represents an intercept or mean of variables 
pointed by the arrow. We matched the notations in the formulas and in the path diagram. For 


simplification, we removed the brackets and the subscripts for the variables in the path diagram. 


A Bivariate Latent Change Score Model 


A bivariate LCSM is first a combination of two univariate LCSMs. Above and beyond that, 
it allows the two processes represented by the LCSMs to interact with each other. Let Y [¢],, and 
X |t], denote the observed data on the two variables X and Y, respectively, from the nth 
(n =1,...,N) participant at time t (¢ = 1,..., 7°) of a sample consisting of N participants 


measured for 7’ times. For the measurement error part of the model, we have 


Y [ila a yltln te ey[t]n 


X[t)n = 2[f]n + ex[tn, 


where ey|t],, follows a normal distribution with mean 0 and variance varey and ez|t],, follows a 
normal distribution with mean O and variance varez. For the latent score from time t — 1 to time 


t, we have 


ylln = ylé — Un + dylé|n 


zit], = z[t — 1], + dz[t]n, 


with dy|t], and dx|t],, denoting the latent change scores for the two variables, respectively. 
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The innovative part of the bivariate LCSM is to allow the latent score of one variable to 


influence the change score of another variable. Specifically, we model the change scores as 


dy[t]n = y8n + By x yft — In + Yyxlt — Ln 


dz|t|,n = £8, + Be x lt — ln + Yey[t — Un 


where 7, and 7, are called coupling parameters. 7, represents the effect of x on the change score 
of y and 7, represents the effect of y on the change score of x. We let x0 be the initial latent score 
and xs be the constant change for x. A multivariate normal distribution is assumed for the initial 


latent scores and constant changes for the two variables such that 


YyOn my0 varyO varyOys varx0y0O vary0xs 

YSn mys varyOys  varys  varx0ys varxsys 
~ MN . 

L0n mx0 varxOyO varxOys varx0  varx0xrs 

LS, mas varyOxs varxsys varx0x%s  varxs 


Using a path diagram, a bivariate LCSM is portrayed in Figure 3. 


Statistical Power Analysis Based on Monte Carlo Simulation 


Statistical power analysis concerns the power of a test to detect an effect different from the 
null. For a model with a set of parameters 0, one can conduct power analysis for one or a subset 
of parameters, denoted by 7, to investigate whether they are equal to 0 or known values 7. 


Therefore, the null and alternative hypotheses of interest are 
Ho: Tt = 79 Vs. Ay oT F To. 


Existing procedures for power evaluation are mostly based on the Wald test or the likelihood ratio 


test. The Wald test statistic is defined as 
T = Ge = mo 1(F = T)) (1) 


where 7 is the parameter estimates in 8 corresponding to T and ® is the covariance matrix of T. 


The Wald test statistic can be compared to a critical value C, under the null hypothesis. If 
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T > Cg, the null hypothesis is rejected. Under the null hypothesis and the typical normality 
assumption, the Wald statistic asymptotically follows a chi-square distribution (x2) with the 
degrees of freedom q, where q is the number of parameters in 7. Then, the critical value at the 
significance level a is Cy = x3 (1 — a). Note that when working with a single parameter, the 
Wald test is the square of a Z test. 

The likelihood ratio test works in the similar manner. In the likelihood ratio test, one first 
estimates the model under the alternative hypothesis to get the value of the likelihood function at 
L,. Then, one can estimate the model under the null hypothesis by fixing the parameters in 7 to be 
Yo to get the value of the likelihood function at Lo. The likelihood ratio test statistic is defined as 


Lo 
T =—2In—. 2 
ai (2) 


The likelihood ratio test statistic is also compared to a critical value C, to decide whether a null 
hypothesis should be rejected. If 7’ follows a chi-square distribution with degrees of freedom q, 
the critical value is y7(1 — a). If T > C4, the null hypothesis is rejected. 


By its definition, the statistical power is 


4 
l 


Pr(reject Ho|H;, is true) 


Pr(T > Cq|AM1 is true), 


where 7’ can be the Wald statistic or the likelihood ratio test statistic. For simple statistical 
analysis such as a t-test, one can obtain an analytical form for 7 and therefore power and sample 
size planning can be conducted easily. However, for LCSMs, both the Wald and the likelihood 
ratio test statistics are complex functions of sample size and effect size. Therefore, the statistical 
power 7 is also a complex function of these factors. Generally speaking, it is difficult or 
impossible to get a tractable form of 7 so that the relationship between statistical power and 
sample size can be easily evaluated. 

To deal with the difficulty in power analysis for LCSMs, we use a Monte Carlo simulation 
based method to approximate the power using the relative frequency to reject the null hypothesis 


given the alternative hypothesis is true. Specifically, the following procedure can be used. 
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1. Decide the significance level. Usually, the default 0.05 can be used. Based on that, get 
the critical value C,. If only one parameter is tested, the C, based on the normal distribution is 
1.96 for a z-test and 3.84 for the chi-squared distribution for the Wald test. 

2. Specify a LCSM MM, under H;, with the hypothesized population parameter values (8). 

3. Generate a set of data with the sample size N and the number of measurement occasions 
T from the model using random number generation techniques. 

4. Fit Models 1, and Mo, the model by setting 7 = 7 , to the generated data and obtain the 
Wald statistic using Equation (1) and/or the likelihood ratio statistic using Equation (2). 

5. If the test statistic T > C, the null hypothesis Ho is rejected. 

6. Repeat Steps (2)-(5) for a total of R(R > 1000) times. 

7. Suppose out of the R replications, the null hypothesis Ho is rejected r times. Then the 
statistical power with the sample size n is estimated by 7 = 5. 

8. For sample size planning, if 7 is smaller than the desired power, say 0.8, one can increase 
the sample size or the number of measurement occasions to repeat Steps 2 and 7 to recalculate the 


power. Otherwise, the sample size or the number of measurement occasions can be set to a 


smaller value. 


The above Monte Carlo simulation based method for statistical power analysis has been widely 
used in the literature for mediation analysis and SEM (e.g., Muthén & Muthén, 2002; Thoemmes 
et al., 2010; Zhang & Wang, 2009; Zhang, 2014). This procedure is especially effective for 
complex models. For example, Muthén & Muthén (2002) illustrated how to use Mplus to conduct 
statistical power analysis for structural equation models using such a procedure. Zhang & Wang 
(2009) focused on how to conduct statistical power analysis for growth curve models with and 
without missing data. Thoemmes et al. (2010) discussed how to apply the procedure in mediation 
analysis. Zhang (2014) extended Theommes et al. for the analysis of missing data and 


non-normal data. 


For a typical power analysis for LCSM, a single parameter is often of interest. For example, 


one may be only interested in the parameter ( in the model. In this case, the power can be 
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calculated using the above procedure based on the Z test. Since Monte Carlo simulation is 
conducted, when estimating the power for one parameter, power for all the other parameters can 
also be calculated without much extra effort. Therefore, in our software, we output the power for 


all parameters in a LCSM model as we will show in our examples. 


Software for Power Analysis for Latent Change Score Models 


Although the idea of Monte Carlo simulation based power analysis is straightforward, it 
would still need the software to implement it to make it useful. Recent, Zhang et al. (2015) 
developed the R package RAMpath that can estimate both univariate and bivariate LCSMs. We 
expanded RAMpath so that it can carry out power analysis for LCSMs. To further simplify power 
analysis for researchers who might not be familiar with R, we also developed online software 


based on RAMpath. 


R package 


The R package RAMpath is now on CRAN and therefore it can be installed directly within 
R as a typical package. For example, to install it, use the R code 
install.packages ("RAMpath"). To use the package within R, use 
library ("RAMpath"). There are three functions in the package for power analysis: 
powerLCs, powerBLCS, and plot. 

The function powerLCsS is used to conduct power analysis for univariate LCSMs. The 


basic usage of the function is given below: 
powerLCS(N = 100, T = 5, R = 1000, betay = 0, myO = 0, mys = 0, 
varey = 1, varyO = 1, varys = 1, varyOys = 0,alpha = 0.05, 
-) 
In the function, N is the sample size and T is the number of measurement occasions. Both of 


them can be a single value or a vector. For example, using N=c (100,200,500) will calculate 


power for the three provided sample sizes. R is the number of Monte Carlo simulation used to 
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estimate the power. A larger R will provide more accurate power estimation but also take more 
computing time. As a rule of thumb, at least 1,000 should be used. alpha is the significance 
level for testing the hypothesis of the model parameters. The default value is 0.05. 

To obtain statistical power, the population parameter values have to be provided. Such 
values can be decided based on literature review, pilot study, expert opinions, etc. By default, all 
the mean, intercept and covariance parameters are set at 0 and all the variance parameters are set 
at 1. Those values typically have to be changed in real power analysis. Note that the name of each 
parameter corresponds to that used in the path diagram in Figure 2. In addition to the basic input, 
for advanced users, other information can be provided to control the parameter and standard error 
estimation methods. For example, the options used in Lavaan to control model estimation can be 
used directly within the function. More information can be found in the help document of the R 
package. 

The output of the R function includes four main pieces of information for each parameter in 
the model. The first is the Monte Carlo estimate (mc.est). It is calculated as the mean of the R 
sets of parameter estimates from the simulated data. Note that the Monte Carlo estimates should 
be close to the population parameter values used in the model. The second is the Monte Carlo 
standard deviation (mc . sd), which is calculated as the standard deviation of the R sets of 
parameter estimates. The third is the Monte Carlo standard error (mc . se), which is obtained as 
the average of the R sets of standard error estimates of the parameter estimates. Lastly, 
mc .power is the statistical power for each parameter. 

The function powerBLCS is used to conduct power analysis for bivariate LCSMs. The 


basic usage of the function is given below. It is the same as for the univariate LCSMs. 


powerBLCS (N=100, T=5, R=1000, betay=0, my0=0, mys=0, varey=1, 


vary0O=1, varys=1, varyOys=0, betax=0, mx0=0, mxs=0, varex=1, 


varx0=1, varxs=1, varx0xs=0, varx0y0=0, varxOys=0, varyOxs=0, 


varxsys=0, gammax=0, gammay=0, alpha=0.05, ...) 


The function plot is used to generate a power curve, which has the form plot (x, 
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parameter, ...). The first input of the function, x, is the output from either powerLCS or 
powerBLCs. In the input of the function for power analysis, either the sample size N or the 
number of occasions T should be a vector. The second input is the name of a parameter to plot its 
power curve. Since there are multiple parameters in a LCSM, one can generate a plot for each 
model parameter. The name of a parameter should match the one in powerLCS or powerBLCS. 
This function will generate one or multiple line plots in which power is shown on the y-axis and 


sample size or the number of occasions is shown on the x-axis. 


Online interface 


In order to help researchers who are not familiar with R, we also provide a Web-based 


interface for power analysis for LCSMs. The URL for the univariate LCSMs is 


http://psychstat.org/lcsm and for the bivariate LCSMs is 


¢ 


http://psychstat.org/blcsm. 


¢ 


The Web interface for the univariate LCSMs is shown in Figure 4. Since the interface is 
built on the R function shown earlier, it requires the same input information and gives the same 
output. For both sample size and number of occasions, multiple values can be provided in two 
ways to calculate power for each given value. We discuss this using the sample size as an example 
since the same method is used for the number of occasions. First, multiple sample sizes can be 
provided and separated by spaces. For example, inputting 100 150 200 will calculate power for 
the three sample sizes 100, 150, and 200. Second, a sequence of sample sizes can be generated 
using the method s-:e-i with s denoting the starting sample size, e as the ending sample size, and i 
as the interval. Note that the values are separated by a colon “:”. For example, 100:150:10 will 


generate a sequence of sample sizes: 100 110 120 130 140 150. 


The interface for the bivariate LCSMs is similar and is not provided here for the sake of 


space. 
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Examples 


In this section, we show how to carry out power analysis for both univariate and bivariate 


LCSMs through several examples. 


Example 1. Type I error rate investigation for a univariate LCSM 


Note that if the null hypothesis is true, the Monte Carlo procedure will yield the type I error 
rate. For example, suppose the parameter (,, = 0 in the population. Then the estimated power for 
it should be the same as the significance level, typically 0.05. For illustration, we set the 
population parameter values to those shown in the second column of Table 1. Therefore, if we 
conduct a power analysis based on those parameter values, we will obtain the type I error rates for 
betay, my0, mys, and varyOys. If our Monte Carlo procedure performs well, we expect the 
type I error rates to be close to the alpha level used. 

The R code for conducting the analysis is shown in Code 1. Note that the significance level 


is set at 0.05 and therefore, we expect the estimated values in the power column are close to 0.05. 


Code 1: R input script for Example 1. 
powerLCS(N = 100, T = 5, R = 1000, betay = 0, myO = O, mys = 0, 


varey = 1, varyO = 1, varys = 1, varyOys = 0,alpha = 0.05) 


The output of the R code is given in Code 2. First, the estimate for each parameter is very 
close to the true population parameter values as can be seen in the column named mc.est. This 
indicates that the power calculation procedure runs well. Second, the Monte Carlo standard errors 
are close to the corresponding Monte Carlo standard deviations, another indicator that the power 
calculation is trustworthy. Third, as expected, the power for bet ay, my0, mys, and vary0ys is 
close to 0.05, the nominal type I error rate. Overall, this suggests that the Monte Carlo based 


method can provide well-controlled type I error rate. 


Code 2: Type I error rate and power for parameters in Example 1. 


pop.par mc.est mc.sd mc.se mc.power N T 
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betay O° 0. 001 -0:056°0.,056 0.2046 100 5 
my 0 Or O00 Ost Zo O26 0.056.100" 5 
mys 0 0,002 0.105 0.105 0.044 100 5 
varey L. OY 994) 0% 083 05082 1.000 100 5 
vary0 iy: Oy S990" Oe 236) 04250 1,000. 100° 5 
varyOys 0 -0.005 0.136 0.136 0.044 100 5 
varys ds Ee OOo Oe aes Le 0O0) L005 


Example 2. Power analysis for a univariate LCSM 


To conduct a power analysis, one has to specify the population parameter values for the 
model. Zhang et al. (2015) included an example on using a univariate LCSM model to analyze 
the WISC data (see McArdle & Nesselroade, 2014). In order to plan a future study with the 
sample size 100 and 5 measurement occasions, we use the estimates as our population parameter 
values. Column 3 in Table 1 shows the roundup parameter estimates being used in our example. 

The R code for conducting the analysis is shown in Code 3 and the output of the R code is 
given in Code 4. From the output, we can see that the power to detect the parameter bet ay to be 
significant with a sample size 100 and a number of measurement occasions 5 is about 0.664. The 
power for another parameter, the constant change mys, is 0.274. Since oftentimes one hopes to 
get a power at least 0.8, a larger sample size is needed for this study. In addition, for studying 


power for different parameters, different sample sizes are often required. 
Code 3: R input script for Example 2. 
powerLCS(N = 100, T = 5, R = 1000, betay = 0.1, myO = 20, mys = 


1.5, varey = 9, varyO = 2.5, varys = .05, varyOys = 0, alpha = 


0.05) 


Code 4: Power for parameters in Example 2. 


pop.par mc.est mc.sd mc.se mc.power N T 
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betay 0.10 0.103 0.043 0.044 0.664 100 5 
my 0 20.00 19.999 0.324 0.319 1.000 100 5 
mys 1.50 1.418 1.106 1.120 0.274 100 5 
varey 9.00 8.961 0.724 0.732 1.000 100 5 
vary0 2.50 2.463 1.151 1.139 0.583 100 5 
vary0ys 0.00 -0.004 0.408 0.403 0.048 100 5 
varys 0.05 0.053 0.1730, 175 0.050 100 5 


Example 3. Generate a power curve for different sample sizes for a univariate LCSM 


Example 2 above showed that a larger sample size was needed in order to get sufficient 
power for parameters bet ay and mys. Although one can try a difference sample size greater 
than 100, for convenience, we can generate a power curve with different sample sizes. For 
example, Figure 5 shows the power curves for the two parameters bet ay and mys with sample 
sizes ranging from 100 to 200 with an interval 10. From the plot, we can easily see that to get a 
power ().8 for the parameter bet ay, a sample size about 150 is needed. On the other hand, a 
sample size larger than 200 is needed for the parameter mys to have a power 0.8, with the exact 
number undecided based on the plot. 

The R code for generating the power curve is shown in Code 5. Note that in the plot 
function, we refer to a specific parameter directly using its name. In the input, seq(100, 200, 
10) generate a sequence of sample sizes and in the output, power for each sample size is 


provided. Code 6 shows the output when the sample sizes are 100 and 200 only to save space. 


Code 5: R input script for power curve in Example 3. 


res <- powerLCS(N = seq(100, 200, 10), T = 5, R = 1000, betay = 


O.1, myO = 20, mys = 1.5, varey = 9, vary0O = 2.5, varys = .05, 
varyOys = 0, alpha = 0.05) 


plot(res, 'betay') 


plot(res, 'mys') 
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Code 6: Output for generating power curves in Example 3. 


S°N100=T5" 

pop.par mc.est mc.sd mc.se mc.power N T 
betay 0.10 0.100 0.044 0.044 0.627 100 5 
my 0 2000. 20.002 O33 Ws 319 1.000 100 5 
mys Leo: « Ms BOG ly 3 Ge eG 0.287 100 5 
varey 9 300° BFPO (Ow P44. 0 782 1.000 100 5 
vary0 2050 2.489 1.218 1.146 0.599" 100" "5 
varyOys O56 80-9. 2008: Or, AS. Oa. O.0 59° 100: "5 
varys 0.05 0.054 0.176 0.175 0.050 100 5 
$*N200-T5° 

pop.par mc.est mc.sd mc.se mc.power N T 
betay Dele 4100 We03t 0.032 Oy oLS 200-5 
my 0 20200 206007: 05 225 4226 14-0.0:0.-200) 5 
mys PepO T2505 0.790) 0529: 0.487 200 5 
varey 9,00: 83971 0.532 0.518 1000 200: 5 
vary0 2.50 -2.480 0.803 0.808 0.904 200 5 
varyOys O..00° “05005 0.:283. 0.283 0.049 200 5 
varys 0.05: 07052 02125: 0.122 0.054 200 5 


Example 4. Generate a power curve for different number of occasions for a univariate 


LCSM 


For LCSMs, power is not only related to the sample size but also the number of 


measurement occasions. With the increase of the number of occasions, one would expect the 
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increase of power. For example, Figure 6 shows the power curves for the two parameters bet ay 
and mys with the number of occasions ranging from 4 to 10 with an interval | and with the fixed 
sample size 100. From the plot, we can easily see that the power increases along with the number 
of measurement occasions. For example, for the same sample size 100, the power is less than 0.2 
with 4 occasions of data but increases to more than 0.8 with 7 occasions of data for the parameter 


mys. The R code for generating the power curve is shown in Code 7. 


Code 7: R input script for power curve with the number of occasions in Example 4. 

res <- powerLCS(N = 100, T = 4:10, R = 1000, betay = 0.1, myO = 
20, mys = 1.5, varey = 9, vary0O = 2.5, varys = .05, vary0Oys = 
0, alpha = 0.05) 


plot(res, 'betay') 


plot(res, 'mys') 


Example 5. Power analysis for a bivariate LCSM 


Power analysis can be similarly conducted for bivariate LCSMs. As an example, we use the 
parameter estimates from a bivariate latent change score model in Zhang et al. (2015) with some 
modification as population parameter values (see Table 2). 

The script in Code 8 shows the R code for power analysis for the bivariate LCSM with the 
sample size 100. From the output in Code 9, we can see that the parameter estimates are not very 
accurate. This is because the bivariate LCSM requires a much larger sample size to provide 
accurate parameter estimates. In this case, the statistical power obtained might not be accurate 


either. 


Code 8: R input script for power analysis for bivariate latent change score model in Example 5. 
powerBLCS (N=100, T=5, R=1000, betay=0.08, my0=20, mys=1.5, varey 
=9, vary0=3, varys=l, varyOys=0, alpha=0.05, betax=0.2, mx0 


=20, mxs=5, varex=9, varx0=3, varxs=l, varxOxs=0, varx0Oy0O=1, 
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varx0Oys=0, varyOxs=0, varxsys=0, gammax=0, gammay=—.1) 


Code 9: Output for power analysis in Example 5. 


pop.par mc.est mc.sd mc.se mc.power N T 
betax O.220: 04.2380 0.260 (0.08 7 0.241 100 5 
betay O08 04164 (0). 572: 04-435 0.081 100 5 
gammax D200 =0.:038 (C6234 (0.1.98 0 vl i eee 10 eo 
gammay -0.10 -0.175 0.641 0.458 0.075 100 5 
mx 0 20.00 20.004 0.336 0.326 T2000: «00> 5 
mxXxS 500" ‘Se933. 72848: Sie L5 OuTGe LOO 5 
my 0 20200: 20 50790 7 346.'0,326 TsO.) SOKO": 
mys Le50? O45) 6298s S2321 Cet oe 0G" 5 
varex 9.00 8.941 0.744 0.732 1.000 100 5 
varey 9200. -84 939" O74 49° 10. 720 I. G00". 100: °5 
varx0 3.00 3.:029° 1.243: 1.222 Ou 739" L005 
varx0xs OOO = Oi ZO" 0 FOS: Os 26 F 0.030 100 5 
varx0y0 beQO* “1052: 10.840: 0. 855 eZ. OO 5 
varxOys 0200-=0./012 0.663) '0. G01 Os OLT 1100-5 
varxs Ow60 -2343.'6, 205-2568) 0/090" 10.0 
varxsys 0.00 0.072 3.559 1.740 0.019 100 5 
vary0 3.00 2.951 1.423 1.245 0.684 100 5 
vary0Oxs 0.00 0.198 2.263 1.629 OOS LOO 
varyOys 0200 O63 7) -dheo0: A351 0.106 100 5 
varys Oe OS.: hath 3760: 2.096 0.024 100 5 


Increasing the sample size will lead to more accurate results as shown in Code 10 where the 
sample size is 500. In planning the sample size for LCSM models, one should pay attention to the 


parameter estimates to make sure they are accurate enough for power calculation. Specifically for 
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the coupling parameters gammax and gammay, the power and type I error are 0.057 and 0.271, 


respectively. 


Code 10: Output for power analysis in Example 5 when the sample size is 500. 


pop.par mc.est mc.sd mc.se mc.power N T 
betax De20- -Oe 2009 02087 "O50 37 1.000:-500°:5 
betay 0.08 050830 0.070 0.068 Quek 99 500.5 
gammax OOO -=O.10014. OC 20350" O..029 0.057 500° 5 
gammay “O20 L022. O42 O72) Os O73 0.271, 500-5 
mx 0 20.00 19.9911 0.145 0.145 L000. 500° 5 
mxXxS 5200 35,0308. 0.939 0.942 1.000) 500° 'S 
my 0 207 OO 299999 OAS. Dis 1G 1000-500 <5 
mys 1450 1.4684 0.889 0.885 O42 0 S00: 5 
varex 9500. 8.9836. 0.340 0.328 1.2 000° 5U0.-5 
varey G00" 829961 0.347 (0.328 1.000) 500-5 
varx0 3400° “3.0052 0.524" 0.523 13000" 5005 
varx0xs O.00--=050 144. “0.222: 0.230 0.047 500 5 
varx0y0 1.00: 120064 0.360 0.360 0.808 500 5 
varxOys O00 SU 0012.20. 199: O20 1. 0.051 5005 
varxs 1.00 1.0312 0.180 0.189 1.000 500 5 
varxsys Os-OD™ - 00028 O64. Ors 63 O04. 500-5 
vary0 Sa00: 23977170 colo 0. bey 1. OOD: 30 0c 2 
varyOxs OOO <OSO0I2 O28 6 06.294 O03 5200) 5S 
varyOys Oss S Og Qa: “On Ab es Oo 0.043 500 5 
varys beGOe “ReO2ae OS 260. 0.253 O42992 5005 
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Discussion and Future Directions 


To complement the research on LCSMs, in this chapter, we discuss how to plan the sample 
size and the number of measurement occasions for both univariate and bivariate LCSMs. 
Specifically, we illustrate how to calculate power for each individual model parameter of interest. 
Since the analytical solution to power is intractable, we used a Monte Carlo based method. We 
also provided an R package RAMpath and an online interface to carry out the power analysis 
procedure. 

In calculating power, we need the information on the population parameter values. Each 
value can be viewed as the unstandardized effect size for the parameter of interest. We did not 
define the standardized effect size such as Cohen’s d (Cohen, 1988) for several reasons. First, 
given the complexity of LCSMs, it is difficult to define a standardized effect size. Second, in 
general, it is easier to specify the unstandardized effect size because when conducting a literature 
review, one can simply adopt the parameter estimates directly from the published results. Third, if 
a researcher is interested in standardized measures, he/she can use the standardized coefficients as 
the population parameter values in conducting power analysis. 

One way to streamline the specification of the population parameters is to use the 
R-squared. For example, in a bivariate LCSM, the variance of the change score is from three 
sources - the constant change, the own level score, and the level score of the other variable. By 
changing the parameter values, one can quantify the portion of variance explained by each source. 
On the other hand, depending on the expected variance explained, one can set the parameter 
values. Using this method, one can take advantage of the existing effect size cutoffs for 
R-squared, namely, 0.02 for small, 0.13 for medium, and 0.26 for large effect sizes. 

The current study can be improved in many ways in the future. First, in this chapter, we 
have focused on the power analysis of a single model parameter as this is the most common 
situation. If a researcher wants to test multiple parameters simultaneously, a procedure based on 
the likelihood ratio test can be developed as for the growth curve analysis in Zhang & Wang 


(2009). 
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Second, in the current study, we have focused on the basic univariate and bivariate LCSMs. 
Since their invention, the basic univariate and bivariate LCSMs have been extended in many ways. 
For example, Hamagami & McArdle (2007b) expanded the traditional specifications of univariate 
and bivariate LCSMs to the parallel process change score model and the second-order LCSMs. 
Grimm et al. (2012) extended latent difference scores to allow for testing hypotheses where recent 
changes, as opposed to recent levels, are a primary predictor of subsequent changes. The Monte 


Carlo procedure used in this study can be flexibly extended to the more advanced models. 


Third, the current study has assumed that the collected data will be complete. However, in 
practice, missing data are almost not avoidable in longitudinal studies. For example, Puma et al. 
(2009) found that student achievement outcomes are often missing for 10-20% in studies funded 
by the National Center for Education Evaluation and Regional Assistance. Missing data reduce 
power and without careful consideration, a well-planned study can become under-powered. 
Taking into consideration of missing data in power calculation requires the specification of 
missing data generating mechanism that can be used in the data generation step in our Monte 


Carlo method. 


Fourth, in our Monte Carlo method, we have assumed that our data are normally distributed. 
However, practical data often deviate from a normal distribution. For example, Micceri (1989) 
evaluated 440 distributions of large-sample achievement and psychometric measures and found 
that all of them were nonnormal. More recently, Blanca et al. (2013) evaluated nonnormality 
using the skewness and kurtosis of 693 small samples and found that 94.5 % of them violated the 
normality assumption. In addition, Cain et al. (2016) reviewed 254 multivariate distributions of 
data used in Psychological Science and the American Education Research Journal and found that 
68 % multivariate distributions deviated from normal distributions. Therefore, in the future, the 


influence of non-normal data should be considered when estimating power. 


Finally, the Monte Carlo based method can be very computationally intensive because of 
the involvement of the Monte Carlo simulation. For example, it took about 10 minutes on a 


modern desktop to complete the power analysis in Example 4. At the same time, the Monte Carlo 
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method can be easily parallelized to take advantage of modern hardware such as multi-core 
processors (e.g., Zhang, 2014). In the future, the R package RAMpath can be improved with the 


capacity of parallelization. 
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Table 1 


Population parameter values used in Examples 1-4 


Example 1 Example 2 


betay 0 0.1 
my 0 0 20 
mys 0 15. 

varey 1 9 

vary0 if 2.5 

varys 1 0.05 


varyOys 0 0 
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Table 2 


Population parameter values used in Example 5 
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Parameter value Parameter 


value 


betay 0.08 betax 


gammax 0 gammay 
my 0 20 mx 0 
Mys 135 mxs 
varey 9 varex 
vary0 3 varx0 
varys 0.05 varxs 
vary0ys 0 varx0xs 
varx0y0 1 @x M 
vayx0ys 0 @x M 
vary0xs 0 @x M 


varxsys 0 @x M 


@x M 


@x M 
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Figure 1. The trajectory plot of latent scores y[t] from time 1 to time 5 with different 3,, values 
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Figure 2. Path diagram for a univariate latent change score models 
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Figure 3. The path diagram for a bivariate latent change score model. 
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Univariate Latent Change Score Model 


Parameters (Help) 


Sample size 

Number of occasions 
Number of replications 
betay 

my0O 

mys 

varey 

vary0 

varys 

vary0ys 

Significance level 
Power 

Power curve 


Note 


Calculate 


Figure 4. The online interface for power analysis for univariate latent change score models. 
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Figure 5. Power curve for bet ay (left plot) and mys (right plot) along with the sample size in 


the univariate latent change score model 
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Figure 6. Power curve for bet ay (left plot) and mys (right plot) along with the number of 


measurement occasions in the univariate latent change score model 
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